AITopics | silhouette coefficient

Collaborating Authors

silhouette coefficient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

753a043674f0193523abc1bbce678686-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 23:30:01 GMT

algorithm, dataset, subspace, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Human-aligned Quantification of Numerical Data

Kolonin, Anton

arXiv.org Artificial IntelligenceNov-21-2025

Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as "quantums," which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of "symbols" that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics correlate with one another and with what is commonly referred to as "human intuition." Our findings suggest that the ability to classify numeric data values into distinct categories is associated with a Silhouette coefficient above 0.65 and a Dip Test below 0.5; otherwise, the data can be treated as following a unimodal normal distribution. Furthermore, when quantification is possible, the Silhouette coefficient appears to align more closely with human intuition than the "normalized centroid distance" method derived from information compression perspective.

artificial intelligence, distribution mode, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.15723

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

Add feedback

Improving the learning process and providing more accurate similarity matrices for unannotated data can positively

Neural Information Processing SystemsOct-3-2025, 06:37:09 GMT

We sincerely thank the reviewers for their valuable comments. We proofread and fixed the mentioned errors. Related Work: Thank you for the additional references. We will include and discuss them in the revised version. Publishing codes: Upon the acceptance of our paper, we will publicly release the source codes.

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DCA: Graph-Guided Deep Embedding Clustering for Brain Atlases

Wang, Mo, Peng, Kaining, Tang, Jingsheng, Wen, Hongkai, Liu, Quanying

arXiv.org Artificial IntelligenceSep-23-2025

Brain atlases are essential for reducing the dimensionality of neuroimaging data and enabling interpretable analysis. However, most existing atlases are predefined, group-level templates with limited flexibility and resolution. We present Deep Cluster Atlas (DCA), a graph-guided deep embedding clustering framework for generating individualized, voxel-wise brain parcellations. DCA combines a pretrained autoencoder with spatially regularized deep clustering to produce functionally coherent and spatially contiguous regions. Our method supports flexible control over resolution and anatomical scope, and generalizes to arbitrary brain structures. We further introduce a standardized benchmarking platform for atlas evaluation, using multiple large-scale fMRI datasets. Across multiple datasets and scales, DCA outperforms state-of-the-art atlases, improving functional homogeneity by 98.8% and silhouette coefficient by 29%, and achieves superior performance in downstream tasks such as autism diagnosis and cognitive decoding. We also observe that a fine-tuned pretrained model achieves superior results on the corresponding task. Codes and models are available at https://github.com/ncclab-sustech/DCA .

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.01426

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Autism (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(3 more...)

Add feedback

Context Steering: A New Paradigm for Compression-based Embeddings by Synthesizing Relevant Information Features

Durán, Guillermo Sarasa, Fontecha, Ana Granados, Ortíz, Francisco de Borja Rodríguez

arXiv.org Artificial IntelligenceAug-21-2025

Compression-based distances (CD) offer a flexible and domain-agnostic means of measuring similarity by identifying implicit information through redundancies between data objects. However, as similarity features are derived from the data, rather than defined as an input, it often proves difficult to align with the task at hand, particularly in complex clustering or classification settings. To address this issue, we introduce "context steering," a novel methodology that actively guides the feature-shaping process. Instead of passively accepting the emergent data structure (typically a hierarchy derived from clustering CDs), our approach "steers" the process by systematically analyzing how each object influences the relational context within a clustering framework. This process generates a custom-tailored embedding that isolates and amplifies class-distinctive information. We validate the capabilities of this strategy using Normalized Compression Distance (NCD) and Relative Compression Distance (NRC) with common hierarchical clustering, providing an effective alternative to common transductive methods. Experimental results across heterogeneous datasets-from text to real-world audio-validate the robustness and generality of context steering, marking a fundamental shift in their application: from merely discovering inherent data structures to actively shaping a feature space tailored to a specific objective.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.1478

Country: Europe > Spain (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Learning Part Knowledge to Facilitate Category Understanding for Fine-Grained Generalized Category Discovery

Wang, Enguang, Peng, Zhimao, Xie, Zhengyuan, Lu, Haori, Yang, Fei, Liu, Xialei

arXiv.org Artificial IntelligenceMar-20-2025

Generalized Category Discovery (GCD) aims to classify unlabeled data containing both seen and novel categories. Although existing methods perform well on generic datasets, they struggle in fine-grained scenarios. We attribute this difficulty to their reliance on contrastive learning over global image features to automatically capture discriminative cues, which fails to capture the subtle local differences essential for distinguishing fine-grained categories. Therefore, in this paper, we propose incorporating part knowledge to address fine-grained GCD, which introduces two key challenges: the absence of annotations for novel classes complicates the extraction of the part features, and global contrastive learning prioritizes holistic feature invariance, inadvertently suppressing discriminative local part patterns. To address these challenges, we propose PartGCD, including 1) Adaptive Part Decomposition, which automatically extracts class-specific semantic parts via Gaussian Mixture Models, and 2) Part Discrepancy Regularization, enforcing explicit separation between part features to amplify fine-grained local part distinctions. Experiments demonstrate state-of-the-art performance across multiple fine-grained benchmarks while maintaining competitiveness on generic datasets, validating the effectiveness and robustness of our approach.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.16782

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Cluster Analysis and Concept Drift Detection in Malware

Mishra, Aniket, Stamp, Mark

arXiv.org Artificial IntelligenceFeb-19-2025

Concept drift refers to gradual or sudden changes in the properties of data that affect the accuracy of machine learning models. In this paper, we address the problem of concept drift detection in the malware domain. Specifically, we propose and analyze a clustering-based approach to detecting concept drift. Using a subset of the KronoDroid dataset, malware samples are partitioned into temporal batches and analyzed using MiniBatch $K$-Means clustering. The silhouette coefficient is used as a metric to identify points in time where concept drift has likely occurred. To verify our drift detection results, we train learning models under three realistic scenarios, which we refer to as static training, periodic retraining, and drift-aware retraining. In each scenario, we consider four supervised classifiers, namely, Multilayer Perceptron (MLP), Support Vector Machine (SVM), Random Forest, and XGBoost. Experimental results demonstrate that drift-aware retraining guided by silhouette coefficient thresholding achieves classification accuracy far superior to static models, and generally within 1% of periodic retraining, while also being far more efficient than periodic retraining. These results provide strong evidence that our clustering-based approach is effective at detecting concept drift, while also illustrating a highly practical and efficient fully automated approach to improved malware classification via concept drift detection.

concept drift, scenario, silhouette coefficient, (14 more...)

arXiv.org Artificial Intelligence

2502.14135

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Clustering Aggregation as Maximum-Weight Independent Set

Nan Li, Longin J. Latecki

Neural Information Processing SystemsFeb-11-2025, 17:55:15 GMT

We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Algorithmic Clustering based on String Compression to Extract P300 Structure in EEG Signals

Sarasa, Guillermo, Granados, Ana, Rodríguez, Francisco B

arXiv.org Artificial IntelligenceJan-31-2025

P300 is an Event-Related Potential widely used in Brain-Computer Interfaces, but its detection is challenging due to inter-subject and temporal variability. This work introduces a clustering methodology based on Normalized Compression Distance (NCD) to extract the P300 structure, ensuring robustness against variability. We propose a novel signal-to-ASCII transformation to generate compression-friendly objects, which are then clustered using a hierarchical tree-based method and a multidimensional projection approach. Experimental results on two datasets demonstrate the method's ability to reveal relevant P300 structures, showing clustering performance comparable to state-of-the-art approaches. Furthermore, analysis at the electrode level suggests that the method could assist in electrode selection for P300 detection. This compression-driven clustering methodology offers a complementary tool for EEG analysis and P300 identification.

artificial intelligence, information management, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cmpb.2019.03.009

2502.0022

Country: